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Abstract. The phase transition in the size of the giant component in random 
graphs is one of the most well-studied phenomena in random graph theory. For 
hypergraphs, there are many possible generalisations of the notion of a compo¬ 
nent, and for all but the simplest example, the phase transition phenomenon 
was first proved in [5]. In this paper we build on this and determine the 
asymptotic size of the unique giant component. 
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1. Introduction and main results 

Random graph theory was founded by Erdos and Renyi in a seminal series of 
papers from 1959-1968. One of the earliest, and perhaps the most important, result 
concerned the phase transition in the size of the largest component - a very small 
change in the number of edges present in the random graph dramatically alters the 
size of the largest component giving birth to the giant component [6]. Over the 
years, this result has been refined and improved, and the properties of the largest 
component at or near the critical threshold are now very well understood. With 
modern terminology, and incorporating the strengthenings due to Bollobas [2] and 
Luczak m we may state the result as follows. 

Let G{n,p ) denote the random graph on n vertices in which each pair of vertices 
forms an edge with probability p independently. We consider the asymptotic prop¬ 
erties of G(n,p) as n tends to infinity. By the phrase with high probability (or whp) 
we mean with probability tending to 1 as n tends to infinity. 

Theorem 1 i ;Ti . [21 . TT1 j ■ • Let e = e(n) > 0 satisfy s — > 0 and e 3 n —> oo, as n —> oo. 

(a) Ifp = then whp all components ofG{n,p) have size at most 0(e~ 2 log(e 3 ?r)). 

(b) Ifp = then whp the size of the largest component of G(n,p) is (l±o(l))2en 

while all other components have size at most 0(s~ 2 log(e 3 n)). 
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Considerably more than this is by now known, but in this paper our focus is 
on a generalisation of this result. While this graph result (and much more) has 
been known for several decades, for hypergraphs relatively little was known until 
recently. 

For some integer k > 2 a fc-uniform hypergraph consists of a set V of vertices 
and a set E C ()'.) of edges, each of which consists of k vertices. (The case k = 2 
corresponds to a graph.) We consider the natural analogue of the G{n,p ) model: 
Let H k (n,p) be a random /c-uniform hypergraph with vertex set V = [n\ in which 
each fc-tuple of vertices is an edge with probability p independently. 

Before we can state a generalisation of Theorem |T] we need to know what we 
mean by a component in hypergraphs. For k > 3 there is more than one possible 
definition. 

Two j-sets of vertices (i.e. j-element subsets of the vertex set V) Ji, J 2 are 
said to be j-connected if there is a sequence E\, ..., Eg of (hyper-)edges such that 
J\ C Ei, J 2 C Et and | Ei D £)+ 1 | > j for each i = 1, 1. This forms 

an equivalence relation on j-sets. A j-component is an equivalence class of this 
relation, i.e. a maximal set of pairwise j-connected j-sets. 

The case j = 1 is sometimes known as vertex-connectivity. This case is by far the 
most studied, not necessarily because it is a more natural definition, but because 
it is usually substantially easier to understand and analyse. 

In [5], it was determined that for all 1 < j < k — 1, the phase transition for 
the largest j-component in random hypergraphs occurs at the critical probability 
threshold of 

However, in that paper the size of the largest j-component after the phase transition 
was not determined. In this paper we extend that result to prove its asymptotic 
size, and also show that it is unique in the sense that the size of the second-largest 
j-component is of smaller order. Once we know that it is unique, we refer to it as 
the giant component. 

Theorem 2. Let 1 < j < k — 1 and let e = e{n) > 0 satisfy e —> 0, £ 3 tt j — > 00 and 
e 2 n 1 ~ 2 ' 5 —► 00 , as n —>• 00 , for some constant S > 0. 

(a) If p = (1 — e)po then whp all j-components of H k (n,p) have size at most 
0(e~ 2 logn). 

(b) If p = (1 + e)po then whp the size of the largest j-component of H k (n,p) is 
(1 ± o(l)) j&y- - (”) while all other j-components have size at most o(en J ). 

While preparing this paper, we learnt that independently Lu and Peng m have 
also claimed to have a proof of a similar result, although for the range when e is 
bounded from below by a constant (i.e. p is much further from the critical value). 

We note that when j = 1 the conditions on £ are simply n -1//3 -C £ 1. This 

provides the best possible range for e. However, for larger j, the third condition 
takes over from the second and we require n -1 / 2+<5 <£<1. This condition arises 
from our proof method. We discuss the critical window in more detail in Section [8] 

The case k = 2 (then automatically j = 1) is simply Theorem [T] The case j = 1 
for general k was already proved by Schmidt-Pruzan and Shamir |16j and indeed 
much more is known for that case, as will also be described in Section [8] 

Our proof works for all j,k £N with 1 < j < k (i.e. all permissible pairs j,k). 
Indeed, for j = 1 this paper provides a new, short proof of the result. For general 
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j, in the supercritical regime (when p = (1 + e)po), we need an additional lemma 
(smooth boundary lemma, Lemma |16I) . which is the main original contribution of 
this paper (the rest of the argument is in essence very similar to a recent proof 
of the graph case by Bollobas and Riordan [4], though slightly more complex in 
full generality). We will prove Theorem [2] using a breadth-first search algorithm 
to explore the component containing an initial j-set. We refer to the collection of 
j-sets at fixed distance from the initial j-set as a generation. Roughly speaking, the 
smooth boundary lemma says that during the (supercritical) search process, most 
generations are “smooth” in the sense that any set L of size at most j — 1 lies in 
approximately the “right” number of j-sets of the generation. 

In order to state the lemma, we need a few additional definitions. For this 
introduction we will be deliberately vague about these definitions and the statement 
of the smooth boundary lemma, which appears in this section as Lemma [3] All 
definitions and exact form of the smooth boundary lemma (Lemma 1161) will be 
restated more explicitly in Section [5] 

For any given j-set J we explore its component Cj via a breadth-first search al¬ 
gorithm. Let dCj{i ) denote the *-th generation of this process, which we sometimes 
refer to as the boundary of the component after i generations. For any 0 < i < j let 
io{t) be the first generation i for which dCj{i) is significantly larger than n e and 
for an £-set L let dL(dCj(i)) be the number of j-sets of dCj{i ) that contain L. 

Let i\ denote the generation at which the search process hits one of three stopping 
conditions, which will be stated explicitly later (see Section IQl) . 

Lemma 3 (Smooth boundary lemma - simplified form). Let e,p be as in The¬ 
orem H®. Then with probability at least 1 — exp(—n 0 ^), for all J,(.,L,i such 
that 

• J is a j-set of vertices; 

• 0 < i < j — 1; 

• L is an l-set of vertices; 

• io(e) + B(logn) < i < ii 

the following holds: 

d L (dCj(i)) = (1 ± o(l))^M ^ n _ ^. 

Lemma [3] (or its more explicit form, Lemma flGl) is an interesting result in itself, 
giving strong information about the structure of reasonably large components. We 
believe that it will prove to be a useful tool applicable to many other results in the 
held of random hypergraphs. 

2. Overview and proof methods 

In this section we will be informal about quantifications and use terms like “large” 
and “many” without properly defining them. The rigorous definitions can be found 
in the actual proofs. 

2.1. Intuition: Branching processes. As is the case for graphs, there is a very 
simple heuristic argument which suggests at what threshold we expect the phase 
transition to occur and how large we expect the largest component to be. We 
consider exploring a j-component of TL k (n,p) via a breadth-first process: We begin 
with one j-set, then find all edges containing that j-set, thus “discovering” any 
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further j-sets that they contain, then from each of these new j-sets in turn we look 
for any more edges containing them and so on. 

The first j-set we consider is contained in fc-sets, all of which could po¬ 

tentially be edges. Later on when considering an active j-set we may already have 
queried some of the fc-sets containing it. However, early in the process, .) is 
a good approximation (and certainly an upper bound) for the number of queries 
which we make from any j-set. Each of these queries results in an edge with prob¬ 
ability p, and for each edge we generally discover (*) — 1 new j-sets (it could be 
fewer if some of these j-sets were already discovered, but intuitively this should not 
happen often). 

We may therefore approximate the search process by a branching process T*: 
Here we represent j-sets by individuals and for each individual the number of its 
children is given by a random variable with distribution Bi(( fc " )>£*) multiplied by 

0 - 1 . 

The expected number of children of each individual is ((^) — l) ( fc "-)p. If this 
number is smaller than 1, then the process always has a tendency to shrink and 
will therefore die out with probability 1. This roughly corresponds to the initial 
j-set being in a small component. On the other hand, if the expected number of 
children is larger than 1, the process has a tendency to grow and therefore has 
a certain positive probability of surviving indefinitely, corresponding to the initial 
j-set being in a large component. 

The expected number of children is exactly 1 when p = po, which is why we 
expect the threshold to be located there. Furthermore, for p = (1 +e)po the survival 
probability g is asymptotically ^ g 1 (this is calculated in Appendix |B|. This tells 

us that we should expect about p(") of the j-sets of % k {n,p) to be contained in 
large components. Moreover, large components should intuitively merge quickly 
and thus form a unique giant component. 

2.2. Proof outline. As is often the case in such theorems, one half of Theorem [2] 
is easy to prove. Specifically, part (jaj can be quickly proved using the ideas imple¬ 
mented by Krivelevich and Sudakov [101 for the graph case. 

We now give an outline of the proof of Theorem [2] (0 . Much of this is similar 
to the argument for graphs given by Bollobas and Riordan [l] - we will highlight 
the points at which new ideas are required. From now on, whenever we refer to a 
“component” we mean a j-component. 

The main difficulty is to calculate the number X of j-sets in large components; 
if we can prove that X is well-concentrated, then we can show that almost all of 
these j-sets lie in one large component using a standard sprinkling argument. 

The first issue that we encounter is, if we find an edge, how many new j-sets do 
we discover? It could be as many as (*) — 1, but on the other hand some of these 
j-sets may already have been discovered, so perhaps only one of these is genuinely 
new. (Any fc-set which would give no new j-set should not be queried at all.) This 
is important for two reasons: Firstly, it is important for the survival probability 
of the branching process approximation; and secondly, it may have a significant 
impact on the size of the component we discover. 

Early on in the breadth-first process (since a very small proportion of j-sets have 
already been discovered) we should discover (.) — 1 new j-sets for almost every edge. 
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For an upper coupling, wc will simply assume that we discover (*) — 1 new j-sets 
for each edge, and couple this process with the branching process T* ■ In order 
to construct a lower coupling we have to be a bit more careful; we only query a 
fc-set if it contains — 1 previously undiscovered j-sets, thus ensuring that we can 
define a lower coupling with a branching process 71 which has a structure similar 
to T* . The process T* will be formally introduced in Section [3] It follows from 
the bounded degree lemma in [5] (restated in this paper as Lemma [6]) that these 
two couplings have essentially the same behaviour. For this overview we therefore 
consider only T* ■ 

The probability that a j-set lies in a large component, and therefore contributes 
to A', is approximately the survival probability of the branching process 7 ~*, which 
as we will see is approximately . Thus we have E(A') ~ ,j? £ (”). For the 

\i) f J 


second moment we need to consider the probability that two distinct j-sets are both 
in large components. As before we grow a (partial) component Cj 1 from a j-set Jj 
until one of the following three stopping conditions is reached. 


(i) The component is fully explored; 

(ii) We have discovered “many” j-sets; 

(iii) We have “fairly many” j-sets that are currently active (i.e. are in the boundary 
dC\). 


Since we are interested in the probability that both j-sets lie in large components 
we may assume that we do not stop due to condition (i). Next, note that, if stopping 
condition (iii) is applied, then with high probability J\ lies in a large component. 
This is not hard to prove using the branching process approximation - if we have 
fairly many active individuals, then it is highly probable that the branching process 
will survive. Hence stopping conditions (ii) or (iii) are essentially only applied if 
the component of J\ is large. This happens with probability roughly ^ £ - . 

We then delete all of the j-sets contained in Cj 1 from 7L k {n,p) and begin growing 
a component Cj 2 from the second j-set J 2 (assuming J 2 itself has not been deleted), 
where any fc-set containing a deleted j-set can now no longer be queried. Deleting 
these j-sets ensures that the new search process is independent of the old process, 
albeit in a restricted hypergraph. Furthermore, it still follows from the bounded 
degree lemma that T* will be a lower coupling. Once again, we stop the process if 
Cj 2 is fully explored or it becomes large, and the probability that it becomes large 
is approximately ^ £ - . 


However, since we deleted some j-sets, it might happen that the search process 
for J 2 stays small even though the component of J 2 in H k (n,p ) is large. In which 
case there is an edge in H k (n,p ) containing a j-set of Cj x which was active when 
we deleted it and a j-set of Cj 2 , i.e. an edge containing j-sets from both dCj 1 
and Cj 2 . We would like to show that the expected number of such edges is o(l), 
or equivalently, that the number of fc-sets containing two j-sets as above is o(l/p), 
and thus with high probability no such edge exists. 

This is the point at which the proof requires new ideas not needed in the graph 
case. For it is not enough simply to count the number of pairs of j-sets, one from 
dCj 1 and one from Cj 2 , for the following reason: Given two j-sets, how many fc- 
sets contain both of them? The answer is heavily dependent on the size of their 
intersection. Increasing the size of the intersection by just one may lead to an 
additional factor of n in the number of such fc-sets. 
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We therefore need to know that Cj 2 and dCj 1 behave approximately as expected 
with respect to the size of intersections of j- sets chosen one from each. For this 
we prove the smooth boundary lemma (Lemma [31 or rather its more precise form 
Lemma [Tbl) . It states that, with (exponentially) high probability, every set L of 
size 1 < i < j lies in approximately the “right” number of j-sets of dCj 1 . 

This enables us to complete the proof by considering, for each j-set J of Cj 21 
the number of j-sets of dCj 1 which intersect J in some subset L , and thus calculate 
the total number of fc-sets which we have not queried because of deletions. This 
shows that with high probability we have not missed any edges from Cj 2 , and 
thus the probability that J\ and J 2 both lie in large components is approximately 
4e 2 /((*) — l) , which multiplied by the number of pairs (Ji, J 2 ) shows that the 
second moment is small enough to apply Chebyshev’s inequality and deduce that 
X is concentrated around its expectation. 

Note that Lemma H (respectively Lemma fldl) is trivial in the case j = 1. In 
this case the proof of the main result therefore becomes substantially shorter. This 
suggests a concrete mathematical reason why the case of vertex-connectivity is 
genuinely easier than the more general case and not simply easier to visualise. 

3. Notation and setup 

We fix j, k £ N satisfying 1 < j < k for the remainder of the paper. Throughout 
the paper we omit floors and ceilings when they do not significantly affect the 
argument. We use log to denote the natural logarithm (i.e. base e). 

Given a hypergraph TL and a set L of vertices of H, the degree of L in 'H, denoted 
di('H), is the number of edges of T~L which contain L. For a natural number £, the 
maximum (-degree of H, denoted A (('LL), is the maximum of dL(’H) over all sets L 
of l vertices of H. When £ = 0, this is simply the number of edges of 'H. 

All asymptotics in the paper will be as n —> 00 . In particular, we use the phrase 
with high probability , or whp, to mean with probability tending to 1 as n —> 00 . We 
also use the notation f(n) ~ g(n) to mean f(n) = (1 ± o(l))g(n). By the notation 
i<j/we mean that x = o(y). 

Given two random variables X, Y, we say that X is stochastically dominated by 
Y if Pr(X > z) < Pr(Y > z) for all zeK. 

As may be apparent from the introduction, the constant (J) — 1 will be an 
important one, appearing many times during the proof. For convenience, we define 



We fix a constant S satisfying 0 < S < 1/6, and think of it as an arbitrarily small 
constant - in general our results become stronger for smaller S. 

We will have various further parameters throughout the paper satisfying the 
following hierarchies: 

n~ j/3 , n~ 1/2+s <A<C £*<£<1 

and 

£*, n -l5/24 < 77 < 70 /(log n) -C 1 / log n. 

We further define 7 ^ = 8^70 for l = 1,..., j — 1. 

Note in particular that for any £ satisfying the conditions of Theorem [2] we can 
choose the remaining parameters such that this hierarchy is satisfied. 
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We will explore j-components in TL k (n,p) via a breadth-first search algorithm, 
i.e. we begin with a j-set and query all fc-sets which contain it to determine whether 
they form edges. For any that do, the further j-sets they contain are neighbours of 
our starting j-set, and for each of these in turn we query fc-sets containing them to 
discover whether they form an edge, and so on. 

During this process we denote a j-set as: 

• neutral if it has not yet been visited by the search process; 

• active if it has been visited, but not yet fully queried; 

• explored if it has been fully queried. 

We refer to discovered j-sets to mean j-sets that are either active or explored, but 
not neutral. We will consider a standard breadth-first search algorithm: 

• BFS(Jj: From an initial j-set J we query any previously unqueried fc-set 
containing it which also contains a still-neutral j-set. If the set of active 
j-sets becomes empty we terminate the algorithm. 

If the context clarifies the initially active j-set we usually write BFS instead of 
BFS(J). 

We will want to approximate this search processes by branching processes: 

• T* is a branching process in which each individual (which represents a 
j-set) has Co ■ Bi(( fe W),p) children independently. 

• T* is a branching process in which each individual (which represents a j-set) 
has co • Bi((l — £*)( fc " -),p) children independently. 

By the notation c • X, for a constant c and a random variable X, we mean a 
random variable Y with distribution given by Pr(Y = ci) = Pr(X = i), for any i. 
(Alternatively c • X may be considered as consisting of c identical copies of X - 
note that it does not consist of c independent copies of X.) 

It is immediately clear that T* forms an upper coupling for BFS(J). That T* is 
a lower coupling (whp) is less obvious, but will be proved later (see Lemma 0. 

Remark 4. It is for this reason that we need £* <C £ - then in the lower coupling 71 
the expected number of children is still approximately 1 +£, i.e. the lower coupling 
and upper coupling are still very similar. 

At several points in the argument we will use the following form of the Chernoff 
bound for sums of indicator random variables (see e.g. [8] Theorem 2.1]). 

Theorem 5. Let X be the sum of finitely many i.i.d. Bernoulli random variables. 
Then for any £ > 0, 

Pi- [X > E(X) + C] < exp (- 2 (E(A 5 ) a + c/3) ) (1) 

P r |A-<E(X)- ( ]<exp(-X_). (2) 

4. SUBCRITICAL PHASE 

In this section we prove Theorem Us). The proof idea here is the same as 
that of the subcritical graph case as proved by Krivelevich and Sudakov [10]. We 
observe that a component of size to must have at least m/c o edges, which were 
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found within an interval of length at most m( fc W) of the search process. Let us 
consider the probability that an interval of this length contains so many edges. 


Pr Bi 


n 

k- j 


Thml 

, (1 - e)p 0 > m /cq < exp 


< exp — 


e 2 m 2 /cq 2 


2 ((1 — e)m/c 0 + em/3c 0 ) 


2 cq 


If m > 3coke~ 2 logn, then this probability is at most n~ 3k / 2 = o(n ~ k ), and there¬ 
fore we may take a union bound over all possible starting points for the interval, 
of which there are at most (^) < n k , and still have a probability of o(l). In other 
words, with high probability no such interval exists, and therefore no component of 
size m > 3coke~ 2 logn exists. Note that we were not concerned about optimising 
this bound on to. □ 


5. Supercritical phase 

In this section we prove Theorem [2] (|b|). which is substantially harder. We first 
gather some basic, preliminary results which we will use several times during the 
argument. 

5.1. Preliminaries. Let Gj = Gj(t; J) denote the j-uniform hypergraph with ver¬ 
tex set V = [n] whose edges are those j-sets which have been discovered by BFS( J) 
up to time t, i.e. having made precisely t queries so far. We will omit the arguments 
t and J if they are clear from the context. 

The following lemma (Lemma 12 from [5]) will play a critical role; it says that 
early on in the search process (in particular in the time period we are interested 
in) the edges of Gj are nicely distributed in the sense that no set of vertices is 
contained in too many j-sets of Gj . 

Lemma 6 (Bounded degree lemma [5]). Let n~ 1+s <C a < e C 1. Then 
there exists a constant C = C(k,j) such that we have with probability at least 
1 — exp(—©( tt 5 / 2 )) for any J,£,t satisfying 

• J is a j-set of vertices; 

• 0 < l < j — 1; 

• t< an k ; 
we have 

A t (Gj{t-,J)) < Can 2 ~ l . 

As indicated in Section [2J the main additional difficulty for the hypergraph case 
is the smooth boundary lemma (Lemma IKS!) , which may be seen as a significantly 
stronger version of Lemma [G] However, the proof will make fundamental use of 
Lemma |5]and therefore does not render it superfluous. 

As indicated previously, we aim to use % and T* as lower and upper couplings 
on BFS( J) for some initial j-set J. Let us describe more precisely what we mean 
by this. In our search process we will certainly always make at most ( fe ” .J queries 
from any j-set. If the actual number is x < .), then we identify these with the 

first x queries from an individual of T* (which we view as a subtree of the infinite 
ik-j)~ ar y ^ ree w hich each edge is present with probability p , and we consider the 
subtree containing the root). The remaining ( fe ” .) — x queries in T* are in effect 
“dummy” queries which do not exist in the search process, but making extra queries 
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is permissible for an upper bound. Thus if we are at time t in the search process, 
then T* may have made more than t queries. However, since we will generally 
be considering generations of the search process rather than the exact time, this 
difference does not affect anything. 

Similarly we can couple the search process with T* by ignoring some queries 
which the search process makes, and considering only those queries which would 
give Co new j-sets, and of these consider only the first (1 — e») ( fc ” .). Of course, this 
requires that there are at least this many such queries in the search process, which 
we will prove using Lemma [G] 

We will denote this coupling of processes (when it holds) by % F BFS( J) -< T*. 

Lemma 7. With probability at least 1—exp(—©(n" 5 / 2 )), the process BFS( J) satisfies 
the following properties: 

(A) For every time 0 < t < (?), the number of edges which we have found up to 
time t is 

f (1 rj)pot if pot > n s 
1 < (1 + rj)n s otherwise. 

(B) Given n _1+l5 -C a -C e -C 1, after every t < an k queries we have 

A t (Gj) < Cam?-* 

for all 1 < l < j — 1 and for some constant C = C{k,j). 

(C) If a -C £*, then for every t < an k , at time t we have T* -< BFS -< T*. 


Proof. Property Let et denote the number of edges found by time t. Note 
that et has distribution Bi (t,p). Let us first consider the case t > n s /po. Then by 
Theorem [5] 


Pr (e t ^ (1 ± r])pot) < 2 exp 


ri 2 n 2S 
3 n 5 


^ < 2 exp {-n s/2 ). 


Similarly if t < n s /po we have 


Pr (e t > (1 + ? i)n 5 ) < Pr (Bi (n 5 /pojPo) > (1 + v)n s ) < exp (-n 5 ^ 2 ). 


Finally we apply a union bound over all times t to bound the probability that 
Property m is not satisfied by 


2 Q exp(-n 5/2 ) < exp(-n 5 / 2 / 2 ) = exp(- 0 (n 5/2 )) 


as required. 

Property ([B|): Follows directly from Lemma [ 6 ] 

Property ([Oj : The upper coupling is immediate from the definitions. The lower cou¬ 
pling follows from Property ®. More precisely, from any j -set J we can bound the 
number of discovered j-sets which intersect J in I vertices from above by (^) A e(Gj). 
For each such discovered j- set, the number of k -sets containing its union with J is 
at most ( k _ 2 j + () ■ Thus if Property JE]) holds then the number of queries from J 
which would give fewer than co new j-sets is at most 

£ (£) G - 1 +e )-f 

e=o y ' \ j / e=0 

= 0{an k ~ j ). 
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Thus the number of k- sets which may still be queried from J and which would give 
co new j- sets is at least 

where the inequality holds since 1 /n, a<e,. □ 

Remark 8 . Note that Property <[B|) and the lower coupling in Property Q only 
hold early in the process. For the rest of this proof we will always assume that 
we are at a time t < an k in the search process, for some a <C £* . Formally, we 
introduce a stopping condition, that we stop our search process at time an k if we 
haven’t already, where we will have a = 0(A) <C £*■ However, we will have other 
stopping conditions and it will follow from those that with very high probability 
we will never reach time an k (see Lemma fTdl) . 

Remark 9. Throughout the paper we will have various Claims and Lemmas stating 
that a certain good event holds with very high probability, generally exp(— 0 (n' 5 / 2 )) 
(though later also exp(—©(n* 5 / 4 )) appears). Without explicitly stating so, we will 
subsequently assume that the good event always holds. More formally, we intro¬ 
duce a new stopping condition for each lemma, and terminate the process if the 
corresponding good event does not hold. By a union bound over all bad events, as 
long as there are not too many of them, with very high probability no such stopping 
condition is ever invoked (note that P(n) ■ exp(— Q(n s ^ 2 )) = exp(—Q(n s ^ 2 )) for any 
polynomial P). 

We prove one more preliminary result which says that the expansion of the 
search process is approximately as fast as we expect (once the boundary becomes 
large). In order to state the result, we first give some general notation: Consider 
the search process BFS(J). Then we write Cj = Cj(i) for the (partial) component 
that we have “currently” discovered, i.e. up to the i-th generation. If the meaning 
of “currently” is clear we sometimes drop the argument i. Similarly we use dCj = 
dCj(i) to denote the i-th generation of the process. 

Lemma 10 (Bounded expansion). Suppose that for some initial j-set J, at gener¬ 
ation i, % -< BFS(J) -< T*. Conditioned on \dCj(i)\ = Xi £ N, with probability at 
least 1 — exp(— Q(n 5 / 2 )) 

( \dCj(i + 1)| = (1 ± 2e*)(l + e)xi if Xi > n 1-5 

[ | dCj(i + 1)| < 2max( 3 )j, n s ) otherwise. 

Note that the second half of the statement gives us only a much weaker upper 
bound, but applies to smaller Xi. 

Proof. We prove the upper bound for the first half of the statement here. The other 
cases are very similar and are given in Appendix lAl for completeness. 

Note that since T* is an upper coupling, conditioned on \dCj(i)\ = Xi, the size of 
the next generation \dCj(i + 1)| is stochastically dominated by a random variable 
Yi+i with distribution cq • Bi(xi( fc " -),p). We have 



E(F i+ i) = c 0 Xi 


P = (1 + e)xi. 
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Furthermore, by the Chernoff bound (Theorem [5]) we have 

Pr(| dCj(i + 1)| > (1 + 2e*)(l + e)xi) < Pr(F i+ i > (1 + 2e*)(l + e)xi) 

= Pr(^>(l + 2 £ *)E(^i)) 

^ -(2 £ ,) 2 E(F i+1 ) \ 

“ P V 3c 0 ) 

= exp(-0(s* 2 a:*)) 

< exp(—0(n _1+2,5 n 1 ~' 5 )) 

< exp(-0(n 5/2 )) 

where for the penultimate inequality we used the fact that £* > n _1 / 2+l5 . This 
concludes the proof of the upper bound for the first half of the statement. □ 

We have the following corollary. Let Bi be the event 

Bi ~{\dCj{i)\ > n 1 - 6 A I dCj(i + 1)1 ± (1 ± 2e.)(l + e)\dCj{i)\) 

V {| dCj(i + 1)| > 2ma x(xi,n s )} . 

Corollary 11. With probability at least 1 — exp(—©(n* 5 / 2 )), no Bi occurs for any 
i such that T* -< BFS(J) -< T*■ 

Proof. For each i and for each x* S N, let A, (x) := {\dCj{i)\ = Xj}. Then 
Pr (Bi) =^Pr(B.j | Ai(xi)) Pr(A,(xj)) 

Xi 

<^Pr(|9Cj(* +1)| > 2max(x i ,n' 5 ) | ^(x,)) Pr^^Xj)) 

Xi 

+ ^2 Pr(|9Cj(i + l)| ^ (l±2e*)(l + £)xj | Aj(xj))Pr(Aj(xj)) 

afi>n 1-<5 

<^exp(-0(n ,5/2 ))Pr(£l i (xj)) + ^ exp(-0(n <5/2 )) Pr(Aj(xj)) 

Xi Xitn 1 - 5 

= 2 exp (—©(n 5 / 2 )) 

and a union bound over all choices of i (of which there are certainly less than n k ) 
gives the desired result. □ 


Note that since £* <C £, this lemma implies that in the time range that we are 
usually interested in (i.e. when x, > n 1_l5 ), with high probability the size of the 
boundary never decreases. This is useful since we later have various concentration 
results which always require the boundary to have a certain minimum size. This 
lemma ensures that these conditions will always remain valid once they are attained. 

Remark 12. The pattern of Lemma flOl and Corollary [11] will be repeated several 
times; we prove that a certain event holds with high probability, where the event is 
dependent on a random variable (i.e. the size of the boundary) but the probability 
is not. With this uniform probability we may always assume that the good event 
holds (if necessary taking a union bound, for example over different generations). 
From now on, we will only give the lemma and use the corresponding corollary 
implicitly without stating it. 
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5.2. Branching processes. We will also need some results on branching processes. 
The arguments here are similar to standard and well-known arguments, but the 
actual processes may have an unfamiliar distribution, so for completeness we give 
the full arguments in Appendix iBl 


5.2.1. Survival probability. Consider the branching process T *, in which the number 
of children has distribution co • Bi(( fc W),p). For the supercritical case we have 
p = (1 + e)po- By setting up a suitable recursion, we may deduce that the survival 
probability g of this process is the unique positive solution to the equation 


(i-e) c 


c °(A) 

= E Pr 


2=0 


Bi 


co 


k-j 


iP) =0(i - q) c 


which is asymptotically 


e ~ 2e/co 


(3) 


(see Appendix [B]). Likewise the lower coupling process 7K, in which the number of 
children has distribution cq • Bi((l — e*)( fc W),p), has survival probability g* with 


Q* ^ 2c/ Cq. 


(4) 


5.2.2. Dual process. When we explore a component and the lower coupling T, sur¬ 
vives indefinitely we can conclude that the component must be large, since the 
search process certainly survives until 71 is no longer a lower coupling. However if 
the upper coupling T* dies out we need some information on the total size of all 
its generations in order to decide whether this implies that the component must be 
small. 

Recall that T* has the offspring distribution 


c 0 • Bi 



where p = (1 + e)p 0 . We denote the event that T* dies out by V and condition on 
it. This defines a conditional branching process 7b, called the dual process , with 
offspring distribution 


c 0 • Bi 



where 


Pd = {1 - £ + o(e))p 0 

(see Appendix IBl for a proof). Thus the expected number of children of any indi¬ 
vidual in Tt> is given by 


c ° (fc U - j') p ' D = 1 ~ e + 0< 1 

and hence in particular the dual process is a subcritical branching process. There¬ 
fore we can give an asymptotic estimate on the expected total size of 7b by standard 
techniques (see H): 


E(\Tv\) = J2(i-z + o^)Y 

2=0 


1 - (1 - £ + 0(e)) 


( 5 ) 
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Consequently we can bound the probability of the process T* being larger than 
some A = A (n) by conditioning on T> and applying Markov’s Inequality 

Pr(|T*| > A) = Pr (-.£>)-l+Pr(:D)-Pr(|T*| > A | V) 

< Pr (-i2?) + 1 • Pr (|7r>| > A) 

© 

< q + (1 + o(l))e /A 
© 2e 

-, (6) 

co 

as long as e 2 A —> oo. 

5.3. First moment. Let X denote the number of j-sets in components of size at 
least 

A := A •nP 

and observe that e 2 A —> oo is satisfied. The principle difficulty is in showing that 
X is with high probability approximately as we would expect. We first calculate 
what this expectation is using the upper coupling T* and the lower coupling % 
and obtain 

E(v )£ (")r,.(|r-| £ A)@|(") 

and similarly 

E( x)>Qrr(|r.| = oo) = e .(")S|("). 

Hence we have 

E(X) = (l±o(l))-( n ). (7) 

co 

5.4. Second moment. Let C denote the union of all components of size at least 
A. In order to apply Chebyshev’s inequality to prove that X is concentrated around 
its expectation, we need to show that E(X 2 ) ~ E(X) 2 . We may interpret X 2 as 
the number of ordered pairs of j-sets in large components (formally we may pick 
the same j-set twice in such a pair) and thus we can write its expectation as 

E(X 2 ) = Pr(Ji,J2 e£) . 

Fix an arbitrary j-set J\. We grow a component Cj 1 from J\ using BFS. We 
denote the upper coupling branching process for the exploration by l~ Jl (i.e. T Jl 
is a particular instance of T*). We continue to grow the component until one of 
the following three stopping conditions is reached: 

(51) The component Cj^ is fully explored (i.e. no j-sets are still active); 

(52) The component Cj 1 has reached size A = An- 7 ; 

(53) The boundary of the component dCj 1 has reached size AA = A 2 n^. 

A technical point on both the second and the third stopping conditions is that we 
check them only at the end of each generation, i.e. we do not stop in the middle of 
a generation. This is in contrast to the stopping procedure in the graph case in [3] 
- in the hypergraph case, stopping immediately would lead to significant technical 
difficulties due to the boundary being spread over two generations. Therefore our 
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convention is much more convenient once paired with Corollary [TT] which automat¬ 
ically implies that neither the boundary nor the whole component ends up being 
significantly bigger than the threshold for the stopping condition, as formulated in 
the following remark. 

Remark 13. With probability at least 1 — exp(—0(n <5,/2 )), when the stopping con¬ 
ditions are reached we have 

| dC Jx \ < 2AV; 

10/,! < 2 A n j . 

Having grown Cj 1 in this way, let us first consider the contribution to E(A' 2 ) 
that comes from j-sets J 2 which lie inside Cj 1 . By Remark [13l there are at most 
2A such j-sets, and therefore the contribution is at most 

E E Pr ( € C) < ( n ) 2A • 0(e) 

Ji J 2 ec Jl 

= 0(eAn 2j ) = o(e 2 n 2j '), (8) 

Since E(AT 2 ) > E(A) 2 = 0(e 2 n 2j '), this contribution will be negligible. 

Let us therefore assume that J 2 lies outside Cj x and fix it for the remainder of 
the proof. We delete all the j-sets of Cj x from 'H k {n,p) - any fc-sets containing 
them may now no longer be queried. 

We now start a new BFS process from J 2 and thus grow a component Cj 2 until 
one of the following two stopping conditions is reached: 

(Tl) The component Cj 2 is fully explored; 

(T2) The component Cj 2 has reached size A = An- 7 . 

Again, we only stop at the end of a generation. The following lemma will be critical 
to the argument 

Lemma 14. With probability at least 1 — exp(— Q(n s ^ 2 )), T* is a lower coupling 
for the search processes of Cj 1 and Cj 2 until these stopping conditions are reached. 

Proof. Note that when we stop exploring Cj ,, because of stopping condition rtSTl) 
and Property jAj we have made 0(\n k ) queries in BFS. Therefore by Property (ICl) . 
% is a lower coupling for BFS throughout the exploration of Cj x . 

We would like to say something similar about the second search process for Cj 2 . 
However, we cannot simply apply Property |C|) since it relied on Lemma [6l which 
only applies to a search process within the complete hypergraph. The hypergraph 
in which we run the search process for Cj 2 is not complete, since the j-sets of Cj 1 
have been deleted. 

We solve this problem by considering an equivalent auxiliary search process in 
the complete hypergraph, but in which we reduce some edge probabilities to zero, 
specifically those fc-sets containing j-sets of Cj,. It is clear that this process is 
dominated by the usual process in the complete hypergraph, to which we can apply 
Lemma El However, we still need to know that the number of queries in this 
auxiliary search process is 0(An k ), i.e. that we have not made too many additional 
dummy queries. 

To prove this, we simply observe that the number of j-sets of Cj x is at most 
2An- 7 , by Remark [l3l and thus the number of fc-sets containing such a j-set is at 
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most 2A•) = 0(An k ). Therefore the total number of additional queries that 
we can possibly make is 0(Xn k ) as required. □ 

Lemma[T4]tells us that the % is a, lower coupling for both the first and the second 
search processes (which also means that T* is a good approximation.) Therefore as 
calculated before for Cj 17 the probability that Cj 2 grows large is (1 ± o[\))2e/cq. 

Let us now compare the reasons why the search processes stop with the events 
that the corresponding components are large. First, observe that if the exploration 
Cj x stopped because the component was fully explored, (ED, then it never reached 
size A and therefore J\ does not lie in a large component. Consequently it will not 
contribute to X 2 . Hence we are interested in the case when the exploration of the 
component of J\ stops due to stopping condition (IS2l) or (lS3l) and we call this event 
£. From the previous observation it is immediate that 

{ Ji G C} => £ 

and thus 


Pr (J 1; J 2 G C) < Pr {£ A { J 2 G £}) = Pr (£) Pr ( J 2 G £ | £) . 
Consequently we have 

E (X 2 ) < ( n ^j Pr {£) Y Pr ( J 2 G £ | £) + o (e 2 n 2j ) 

<(l + o(l))Mpr(f) Y Pi-(^e£|f). (9) 

V 7 

In order to give a suitable upper bound for Pr (£) we have to analyse the upper 
coupling T Jl if we stop due to stopping condition (IS 2 l) or (ISdl) . For technical reasons 
we distinguish two cases in a way that might seem a little awkward: If stopping 
condition (1S2I) was implemented, then clearly J\ does lie in a large component and 
therefore the total size of T Jl will be at least A since we already reached that size 
when we stopped the exploration. This motivates a case distinction according to 
the following implication 

£ =>. ||r Jl | >A}v(£a{|T J i | <A}) . (10) 

Let W be the event that the generation of T Jl at which we stopped the exploration 
process is larger than AA. Observe that 

£ A {\T Jl | < A} =>■ W A {|T Jl | < A} , (11) 

since the event £ can only hold (subject to {|T Jl | < A}) if the boundary dCj 1 
of the explored component was at least of size AA and hence the corresponding 
generation of the upper coupling must also have been large, i.e. W needs to hold. 
Since 

{|T Jl |<A} => {|T Jl |<oo}, (12) 

we want to consider the event that T Jl dies out after having had a large generation. 
Intuitively we would imagine that the chances of this happening are very small so 
let us make this intuition more precise. 
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Assume that VV holds. Then there is a generation of T Jl with at least AA j-sets 
in the boundary, and from each of these we start an independent copy of T*, all of 
which need to die out in order for T Jl to die out. Thus we have 


Pr (|T Jl | < oo | W) < Pr (IT*| < oo) AA < ^1 — < exp ^ 


eAA\ 

~~5F) = 


o(l), 

(13) 


since the survival probability of T* is (1 ± o(l)) 1 > e/2 k . This implies 

Pr (\V A { \T Jl | < oo}) = Pr(VV) Pr (| T Jl \ < oo | W) 


= Pr(WA (|T Jl | = oo}) 


Pr (\T Jl I < 


W) 


Pr (\T Jl | = oo | W) 


GD 


< Pr (\T Jl | = oo) • o(l) 

ID , s 

= o(e). 


(14) 


Putting everything together we obtain 

GD,GU. GD 


Pr(f) 


< Pr(|r Jl | > A) +Pr({|T Jl | < oo } A W) 


©■ GD 2s 

< — + o(e) . 

Co 

Substituting (fTKll into (H|), we obtain 


E(Al ) < (l + o(l)) — 


2s (n 


co \J 


E Pr(J 2 e£|£:). 

J2S^)\Cj 1 


(15) 


(16) 


Let us now consider the component containing a given Jn £ (^) \ Cj 1 . If Cj 2 
has size at least A, then certainly the component containing J 2 in T-L k (n,p) has size 
at least A. On the other hand, if Cj 2 stops because of (HU) , then it may be that in 
fact the whole component is large, but we missed some of it because of the j-sets of 
C Jl which we deleted. We would therefore like to say that the number of possible 
queries between these two sets is small, and so we are unlikely to have missed any 
edges because of deleting Cj 1 . 

We first observe that such queries can only occur between Cj 2 and the boundary 
dCj 1 of Cj x (any j-sets of Cj 1 not in dCj 1 were already fully explored). The 
intuition behind the argument is that Cj 2 remains small, while the boundary of 
Cj 1 is very small, so the number of pairs of j-sets, one from each side, should still 
be small. We might therefore expect that there are very few k -sets containing such 
pairs. 

The problem with this basic argument is that the number of k -sets containing a 
pair of j-sets is heavily dependent on the size of their intersection. While on the 
whole most pairs of j-sets do not intersect, those which do carry disproportionately 
large weight because there are many more k -sets containing both of them. 

We therefore aim to show that dCj 1 is “smooth” in the sense that for any 
0 < t < j — 1 and for any Aset L , the number of j-sets in dCj^ which contain L 
is about the “right” number, and in particular almost the same regardless of the 
choice of L (though dependent on \L\ = t). This statement is formalised in the 
following lemma. 
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Let i\ denote the generation at which Cj x hits one of the stopping conditions 
((EH), (fS2|) . (EH). Recall that c?l(9C'j 1 (*)) is the degree of L in the ^'-uniform 
liypergraph with vertex set V = [n] and edge set dCj 1 (i), i.e. the number of j-sets 
of the boundary containing L. 

Lemma 15. Conditioned on E, with probability at least 1 — exp(— 0(n s £ 2 )), for 
every £, L such that 

• 0<£<j-l; 

• L is an £-set of vertices; 
the following holds: 

dUdCj^h)) = (1 ± U _ • 


We note that this lemma is considerably stronger than we would need for the 
proof of our main result (for which concentration within a constant multiplicative 
factor would be sufficient). However, since the result is also interesting in itself, we 
in fact prove a result which is even stronger than Lemma fl5l 


Consider the process BFS starting from an arbitrary j-set J. Let io(£) be the 
least i such that \dCj(i)\ > n l+s . For each £ = 1,..., j — 1, let m := 1 , let 


i*{£) := 


(j-t) log^ 

- log((l+2r)+2£)rc) 


, and let i\{£) := io(£) + i*(£)- 


Lemma 16 (Smooth boundary lemma). Let e,p be as in Theorem [H |b|. With 
probability at least 1 — exp(—©(n 5 / 4 )), using BFS(J) , for every J,£,L,i such that 

• J is a j-set of vertices; 

• 0 < £ < j — 1; 

• L is an £-set of vertices; 

• ii{£) < i < ii 
the following holds: 

dL {dCj(i)) = { i± 7 ,)^p^^. 


The intuition behind the various generations which we have defined is as follows: 
In order for various concentration results to be valid, we will need the average degree 
of £-sets to be reasonably large, so io(£) is a “starting time”. Prior to this time we 
have no information about the degrees of ^-sets, so they may be very non-smooth. 
Over time, though, any disparity will tend to even itself out, and i*(£) is the time 
it takes for this process to be complete. From that time i\{£) on, the degrees of 
^-sets should be smooth. 


Remark 17. Note that at time io(£) we have \dCj(io(£))\ > n e+s > n 1-<s , for £ > 1. 
This means that, by Corollary [Tl] the size of the generations will never decrease. 
This will be important for various concentration results. 

Lemma m is almost an immediate corollary of Lemma [TH] the only issue is 
that we need to know that i\ > i\{£) for every £ = 1 ,... ,j — 1 , i.e. that we do not 
reach the stopping conditions before the boundary is smooth. This turns out to be 
a surprisingly tricky fact to prove, but it will be shown in Section [3 

We will prove Lemma |16l in Section [6] First, we show how to use Lemma [15] to 
complete the proof of Theorem [2] 
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Conditioned on £ and {| T Ja | < 00 } , we need to analyse the event T of Ji being 
a (potential) false negative , i.e. Cj 2 contains fewer than A j-sets but the component 
of J 2 in 7~L k (n,p) is larger than Cj 2 . (In fact, a genuine false negative would require 
the component to be larger than A, but bounding the probability of this weaker 
event will be sufficient.) 

We know, by Remark [131 that the boundary of Cj 1 has size at most 2AA and 
each £-set is contained in 0(XA/n i ) sets of the boundary, by Lemma fl5l Thus for 
a j-set of Cj 2 , there are 

3 - 1 

J20(XA/n e )0(n k ~ 2j+e ) = 0{XAn k ~ 2j ) 

£=0 

k- sets which we did not query because they contained j-sets of Cj 1 . Therefore, if 
we assume Cj 2 contains precisely r £ N j-sets, the expected number of edges within 
these disallowed fc-sets is 


O (AA n k ~ 2:i rp) = O (X 2 r) 

and thus the probability that we have overlooked at least one edge is O (X 2 r). 
Hence, by the law of total probability we obtain 


Pr (.F | £ A {|T J2 | < 00 }) = O A 2 ^rPr(|Cj 2 | = r | £ A {\T J2 \ < 00 }) 


r—0 


(107,1 

\£ A {i 

|' J~J2 | 

| £A{\' 


1 ~ J2 indep. of E 


(e 2 E(|r j2 | | \T J2 \ < 00 )) 


= o (e 2 E(|7x>|)) 

H , N 

= o(e)■ 


(17) 


This solves the main difficulty (apart from the proofs of Lemmas [ 16 ] and fl5l) . 

Since we only need an upper bound for the probability that the component of 
J2 in T-L k (n,p) is large, we do not need to care about false positives. Therefore we 
consider J2 to be large in the following three cases: 

• T j2 survives; 

• T Ja dies out and J2 is a false negative; 

• T Ja dies out, J2 is not a false negative and its component in 7i k {n,p) is 
large. 

Observe that every j-set J2 £ C will satisfy exactly one of these conditions. 

Still assuming that £ holds, we calculate the probabilities of these events. For 
the first case we obtain 

Pr (\r j2 1 = 00 I £) = Pr (\r j2 1 = 00) ~ — , ( 18 ) 

c 0 

since the branching process 'T J2 is independent of £. Moreover, estimate (El) im¬ 
mediately shows 

Pr({|T j2 | < 00} A T | £) < Pr (J 7 | £ A {\T J2 \ < 00}) ^ o(e) . ( 19 ) 
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For the last case let us first note that 

(HFA J 2 S £) =► {|Cy 2 | > A} =► {|T j 2 |>A} 
and thus we have 

Pr ({ \ rJ2 1 < 00} A A { J 2 G £} | £) < Pr (A < \T J2 \ < 00 \ £) 

= Pr (A < \T J2 \ < 00) 

< Pr(|T J2 | > A | |T J2 | < 00) 

= Pr(|7i| > A) 

< (! + o(l))e _1 /A = o(e), ( 20 ) 

by Markov’s Inequality and since e 2 A —»• 00. Consequently, by estimates (fT 51 ) . 
and (l 20 l) . we obtain 


Pr (J 2 G C | £) < — + o(e). 

c 0 

This yields a good enough upper bound on the second moment: 

f n\ 2e 


( 21 ) 


EfA' 2 ) 1 ? (l + o(l)) 


Co 


2 e (n 


I2H 

< (l + o(l)) - 

\co \J 

i(l + o(l))E(X ) 2 , 


E Pr ( J 2G-C|F) 

J2€(^)\Cj 1 
2 


and therefore we have for the variance 


Var (X) = E(AT 2 ) - E(X ) 2 = o(E(X) 2 ). 

Hence Chebyshev’s Inequality tells us that for any constant £ > 0, 

Pr(X ^ (1 ± C)E(X)) < = o(l/C 2 ) = o(l), 

and so whp 

X = (1 ± o(l))E(X) = (1 ± o(l))^(;). 

We have thus shown that the number of j -sets in large components is approxi¬ 
mately as expected 


5.5. Sprinkling. However, we also need to know that they all (or at least almost 
all) j- sets of C lie in the same component. To prove this, we use a standard 
sprinkling argument. 

We first borrow a lemma from [J], which states that we may pick out a subse¬ 
quence of queries and treat it like an interval of the search process. To this end we 
denote by X t , for each t = 1 ,..., (£), the indicator random variable associated to 
the t-th query of the search process. We will be considering a random subsequence 
t\, t 2 , ■ ■ ■, t s from [(£)]. We say “U is determined by the values of X lt ..., X ti -i” 
to mean the following: For any j. whether the event {ti = j} holds is determined 
by the values of X\ ,..., Xj_ 1 . In particular this means that ti is chosen before X ti 
is revealed. 
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Lemma 18. Let S = [t\, t ?,..., t s ) be a (random, ordered) index set chosen ac¬ 
cording to some criterion such that 

• ti is determined by the values of X i,... ,X ti -i; 

• with probability 1 we have 1 < t\ < t 2 < ■ ■ ■ < t s < (?). 

Then {X tl ,..., X ta ) is distributed as (Ij,..., Y s ), where Yi,..., Y s are independent 
Be(p) variables. In particular, we may apply a Chernoff bound to Aj. 

This lemma applies in particular to subsequences given by the set of queries to k- 
sets containing a particular Yset while exploring any component, and we therefore 
have the following corollary. 

Corollary 19. With probability at least 1 — exp(— <d(n s / 2 )) for every 1 < I < j — 1, 
for every £-set L and for every component C, if the number of queries from j-sets in 
C to a k-set containing L is x > n k ~^ +& , then the number of edges in C containing 
L is (1 ± o(l))px. 

Proof. First fix I, L and C. By Lemma 1181 we may apply a Chernoff bound to 
the set of queries to fc-sets containing L within C. Then the probability that the 
number of edges in C containing L is not in (1 ± Cfjpx, for some constant (f > 0, is 
at most 

2 exp = 2 exp (~px( 2 / 3) 

> 2 exp (— n s ( 2 /3). 

We may now apply a union bound over all choices of £ and L (of which there are 
at most n?) and all choices of C (of which there are at most nP) and deduce that 
the probability that any one of these choices goes wrong is at most 

2exp (— ( 2 n 5 /35 2 ) < exp (^—n s ^ 2 ^ 

as required. □ 


Claim 20. For each 0 < i < j — 1, whp any set L of £ vertices is contained in at 
least 0(A j-sets of any component of size at least A. 


Proof. We assume that the high probability event from Corollary [T9] holds. Consider 
a component C with fin 3 j-sets, where /? > A. We prove by induction on £ that any 

/ , 2 \ — ^ 

Yset is contained in at least (2(1) J j-sets of C. The case £ = 0 simply 

reasserts the fact that C has size /3n J , so assume £>l. 

Now let L' C L be a set of £ — 1 vertices, which by the inductive hypothesis lies 

in (2(^) ) fdn^ i+1 j-sets of C. Then for each such j-set J there are at least 
(fcljli) fc-sets containing J U L (more if L C J), and each of these k- sets will be 
counted in this way at most ( fc ) times. Thus the number of queries from j-sets in 
C to a fc-set containing L is at least 


m)^ +i 


> (1 - 0 ( 1 )) 


fin 


k-e 


2 e-ipf \k-j-if 
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Since /3n k e > Xn k • J+1 > n k 3+s , we may apply Corollary IT9l and thus the number 
of edges we discover is at least 


(l-o(l))p 


fin 


k-e 


(k — j)\n 3 k 

~CF 7 


2 


> 




pn k -t 

*(* 


Since L and C were arbitrary, this proves the inductive step. 


□ 


Note that a substantially stronger version of Claim [27)1 follows (with a little care) 
from the smooth boundary lemma and our arguments in the remainder of this 
paper (particularly Lemma 1241) . We give this proof here because it is much more 
elementary and does not rely on the heavy machinery of the smooth boundary 
lemma. 


Claim 21 . Whp, H k (n,p) has a giant component of size (1 ± o(l)) ^ 2£ - 
In particular all other components have size o{en 3 ). 



Proof. Let p 2 := A i 1 °f_j ) + i and pi be such that pi+p 2 ~ P 1 P 2 = (1 + s)po = P■ Note 

that pi = (H-£')p 0 , where e' = (l + o(l))£, so % k {n,pi) also has (l±o(l)) (") 

uJ 1 3 

j-sets in large components. 

By Claim m each (j — l)-set lies in fl(An) j-sets of any large component. (In 
particular, this means that the number of large components is already bounded by 
0(1/A).) Suppose we have two distinct large components. Pick a (j — l)-set J' 
and consider the [k — j + l)-uniform link hypergraph of J ', i.e. the hypergraph on 
[n] \ J' whose edges are all (k — j + l)-sets which, together with J', form an edge 
of H k (n,p). This has two distinct vertex components, each with fl(A n) vertices. 
There are therefore S2(A 2 n k ~^ +1 ) possible (k — j + l)-sets which meet both. 

By the choice of pi we have H k (n,pi) U H fc (n,p 2 ) = H k (n,p). By adding in 
another round of exposure with probability P 2 , the probability that these two com¬ 
ponents of H k {n,p\) do not merge is at most 

(1 - P2 f(^ k ~ j+1 ) < exp C«(A 2 n fe - j+1 )) < exp (-(logn) 3 / 2 ) . 


Now suppose the components of H k (n,p\) are C 1 ,... ,C S , where s = 0( 1/A). Then 
the probability that at least one of these components does not merge with, say, Ci 
is at most 

sexp (logn) 3 / 2 ^ < exp (log?z ) 3//2 + ^(n 1 / 3 )^ = o(l). 


Thus there is a single component of size (1 ± o(l)) fl ? e ( n ), while this is also the 

number of j-sets in large components, i.e. sprinkling the edges with probability 
P 2 may have created more large components, but they can only have total size 
o(en 3 ). □ 


6 . Smooth Boundary: Proof of Lemma [Tg] 

We will prove Lemma [TH] by induction on I. The case I = 0 is trivially true, 
since the empty set is contained in all edges of the boundary. (Indeed, this would 
hold even with 70 = 0 , but choosing 70 7 logn as we did will be convenient later 
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on.) From now on we therefore assume that £ > 1 and that the statement holds for 


The statement of Lemma m says that with probability 1 — exp(—0(n 5 / 2 )), a 
certain property must hold for every initial j-set J. We note that it is enough to 
show this for a single initial j-set - then the full generality is implied by a union 
bound, since ( n )exp(—0(n' 5 / 2 )) = exp(—0(n d / 2 )). Therefore from now on we fix 
an initial j-set Jo, and for simplicity we denote dCj 0 (i ) by d(i) for any i. 

Recall that for 1 < l < j — 1 we defined io(£) to be the first generation i for which 
|d(z)| > n t+s , which is when we aim to start the process of smoothing the degrees 


of £-sets, while i*(£) 


(j-{) log n 
- l°g((l+2j)+2e)r £ ) 


is the time taken for this smoothing 


process. Then ii(£) = io{£) + i*{£). 

We will first show that i\(£) < io{£ + 1) for each 1 < £ < j — 2, i.e. we finish 
the smoothing process for the £-sets before we start the smoothing process for the 
(£ + l)-sets. (This is important because of the inductive nature of the proof.) 

So observe that by Corollary fill and Remark ITdl we have 


l<9(*iW)l < |d(ioW)|((l+£)(l + 2£*)) i * W 

< 2 n e+s exp (2 ei*{£)) 

< 2r/ +& exp(0(elogn)) 

< n e+2S < n l+ \ 


and therefore we have not yet reached generation io(£ + 1 ). 

We now prove the inductive step in Lemma [TO] with the help of a number of 
preliminary claims. Our strategy is as follows. We fix i and examine how the j-sets 
of d(i) can influence the degree of L in d(i + 1). To simplify notation, we write 
di(z) instead of Jz,(9(z)). 

• A jump to L occurs when we query a fc-set containing L from a j-set which 
did not contain L and the fc-set forms an edge of % k {n,p). Such an edge 
contributes at most (^Zf) to + 1 ). 

• A neighbourhood branching at L occurs when we query any fc-set from a 
7 -set containing L and it forms an edge. Such an edge contributes at most 

(Jzi) -1 to d L (i + 1). 

We aim to show the following: 

• The contribution to d,L(i + 1) made by jumps to L is approximately the 
same for each L ('Claim [22]) . 

• The contribution to Jl(z + 1) made by neighbourhood branchings at L 
tends to be smaller than Jl(z) IClaim l2dl) . 

• After sufficiently many steps, the degree in the boundary is approximately 
smooth. 

The first two steps are essentially a concentration of probability argument, which 
can only hold with high probability once the boundary is large. This is the reason 
why we introduced a starting time io(£). Furthermore, we need to know that this 
smoothing process finishes before the stopping conditions of the search process are 
reached, i.e. that i\ > i\ (j — 1). We therefore need an additional technical lemma: 

• The component grown up to the start of the smoothing process is small 
(Lemma [24]). 
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At this point we fix 1 < £ < j — 1 and an Aset L. We will show that the degree 
of L is well-concentrated with a sufficiently high probability that, at the end of the 
argument, we can apply a union bound over all choices of L. We first show that 
the contribution of jumps is likely to be well-concentrated around its mean. 


Claim 22 (Smooth jumps). Suppose |<9(z)| > n i+s . Then with probability at least 
1 — exp(—©(n 5 / 2 )), the contribution to di,(i + 1) made by jumps to L is 


(l±2 7 *_ 1 )(l + e) 


(j-t) ((e) - ©) |3(i)| 


co 


(?) 


- O 



Before we prove this claim, we provide an argument for why the main term should 
intuitively be the correct contribution from jumps to the degree of L. To see this, 
we consider the number of jumps we expect to all Asets in generation i + 1. We 
have |<9(z)| j-sets in generation i, from each of which we may query approximately 
(k-j) k-sets, and each forms an edge with probability p. Thus we expect to find 
about 

p(, n .W)| = —|3(i)| 

\k - Jj c 0 

edges. On the whole, an edge results in a jump to (^) — (^) Asets (any which are 
contained in the edge but not in the j-set from which we made the query). Since 
there are (") Asets in total, the average number of jumps to an Aset “should” be 
about 

c ° \e) 

Finally, for most jumps to L, the number of j-sets containing L which become 
active is (jZ%) ■ 


Proof. We consider how many ways we might jump to L. Given any L' C L, there 
are djy(z) — di(*) J-sets containing L' in d(i) from which we might jump to L. 
However, this may include some j-sets which actually have a larger intersection 
with L than L '. The number of j-sets in d(i ) which have intersection exactly L' 
with L is also at least 

L'CL"CL 


Note that by the inductive hypothesis for Lemma ITBl dL'(i) is of order |9( j)|/n f 
(where £' = \L'\), and similarly for each L' C L" C L. Furthermore dx,(z) < dL"(i ) 
by definition. Therefore the number of j-sets in d(i) which intersect L in exactly 
L' is 


d L '{i)-d L (i) iit'= 1-1 

(1 — 0{l/n))dL'(i) otherwise. 


From each j-set J intersecting L in L ', we need a lower bound on the number 
k -sets containing L which we may query, and which contain Co neutral j-sets. So 
consider the number of such fc-sets which are not permissible because they contain 
some other, non-neutral j-set J'. We make a case-distinction based on the inter¬ 
section I := J' fl (J U L). Let i := |/|. Then the number of non-neutral j-sets with 
this intersection is at most A*(Gy) = 0(Xnf~ l ) by Lemma [TJ Property (iBl) . 

The number of fc-sets containing L, J and J' is of order n k ~j- ( + e Thus 

the number of non-permissible fc-sets is at most 0(Xn k ~^ e+e ). Note that this is 
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independent of i. Furthermore, the number of possible choices for I is at most 

EUC + --0 = o(i). 

On the other hand, the total number of possible fc-sets is (^-j-e+e') > an< ^ so 
the number of queries we may make to fc-sets containing L and Co neutral j-sets is 
(1 — O(X)) ( fe _ ; E+0' The probability that each of these queries results in an edge 
is p. Thus the number of edges resulting in jumps from L' to L has distribution 
Bi (N,p), where 

"=<1 -° (A >>G-A+fK^> 

Note that we use N to denote both a lower bound on the number of queries which 
would result in jumps to L and would contribute CjZf) to the degree of L , and also 
an upper bound on the total number of queries which would result in jumps to L. 
Of course, technically these are different, but both satisfy the asymptotic formula 
above. 

For simplicity, we will assume that 

N > Q(n k ~ j+5 ) 

in all cases. This certainly follows from the fact that d* L , L (i) > O +s ) if 
f < t— 1 . For l' = l— 1, if N = o(n fe-J+5 ) then d* L , L {i) = o(n 1+s ), and it must be 
that dz,(*) = (1 — o(l))dz/(z) = 0(|<9(i)|/ri/ -1 ), and therefore the 0(di(i)/n) error 
term in the statement of Claim [22] is as large as the main term. The lower bound 
is therefore automatic and we only need to prove an upper bound, which will only 
become harder if we increase N. 

By the Chernoff bound (Theorem [5]) , the probability that the number of such 
edges is not in (1 ± p)pN is at most 

2 exp(— p 2 Np/2,) < exp (—0(p 2 n s )) < exp(— Q(n s ^ 2 )). 

If this unlikely event does not occur, then the contribution to di(i) made by 
jumps from L' to L is 



(1 ± 7i)N 


1 + £ 


(l±2r ? )(l + e) 



(k-j)\ d* L , iL (i) 
{k — j — l + £')\ 


We now sum over all such sets L' C L and use the fact that 

( di,{i) 


d L',L( l )_ n /-Ui (*) 




in all cases. We also use the induction hypothesis from Lemma [TBl which tells us 
that 
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and deduce that the contribution to dz,(i + 1) made by jumps to L is 


£’=0 


£’ 


= (1 ± 3r?)(l + e) 


c 0 (fc — j — £ + £')\ n e 

£ 


(E) ^ 


co 


E 


=0 

o (k-e\ e-i 

= (1 ± ^7«-i)(l + e)^- E ' 

2 co 

u £'=0 

=(1±27< _ l)( i +e M^ f -o^ 


{k-j)\ d L f(i) 

-°( 

d L (i) 

— j — £ + £')\ n^~^' 

n 


|3(0I 

-°( 

1 

E> 

+ 

1 



di,{i) 


where 


•» G) 


/ = /(2,M):=a^E77T 


(«'■) 


O'-W-i-* + *')! 


{k-j)\j\*yk(£\ (?-/') 


£\ 


l'=0 


e) (k —1)\ 


ik-j)\j\ y, f£\fk-£ 
-{k-£)U\\^ o \£')\j-£'j \j 


k-i 


( k- j)\j\ ( (k 


(k — £)!£\ \\j 
k\ 


k-i 

U-f) 


3- 


£\{k-£)\ £\{j — £)\ 

k\ O' 

£j 


n 


as required. 


□ 


Note that the effect that di(i) has on the number of jumps to L in generation 
i + 1 is very small (the 0(di(*)/n) term). Next we show that the effect that dz,(i) 
has on the number of neighbourhood branchings at L in generation * + 1, while 
significantly larger, is still likely to be smaller than dz,(«). Let ci := Qzf) ~ 1 for 
l = 1 , ... ,j — 1 (and observe that this is consistent with the previous definition of 

co). Recall that ^ < 1 for £ > 1. 

Claim 23 (Neighbourhood branchings contract). With probability at least 
1 — exp(—©(n' 5,/4 )), the contribution to d,L(i + 1 ) made by neighbourhood branchings 
at L is 

f(l ± 97)(1 + e)r e d L (i ) if d L (i) > n d/3 \ 

1 < (1 + 77 ) (1 + e)ren s / 3 otherwise. 

Note that the “bad” probability in this claim is larger than in the other results 
we have stated. For technical reasons, we will need to apply Claim [23] for smaller 
boundaries than, for example, Claim l22l 
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Proof. Let D denote the contribution to d^(i + 1) made by neighbourhood branch¬ 
ings at L. We first prove the upper bound. From every j- set containing L , we 
make at most (," .) queries, and each time we discover an edge in this way, it has 
a contribution of at most ct to + 1). Thus D is stochastically dominated by a 
random variable Y with distribution eg • Bi(( fc ™ -)dL(*),p), which has expectation 

E (Y)=c e f n ^]d L (i)p = (1 +e)— d L (i) 

\k - jJ c 0 

= (1 +e)r e d L (i). 

Thus when dz,(z) > n 5 / 3 , by the Chernoff bound (Theorem [5]) applied to Y we have 
Pr(D > (1 + 77)(1 + e)ndL(i)) < Pr(T > (1 + 77) (1 + e)ndL{i)) 

< exp (—7? 2 (1 + e)red L (i)/3) 

< exp(- 0(?7 2 d L (i))) 

< exp(—©(n^ 4 )) 

since 77 > n~ s ^ 2i . This proves the upper bound in the case that dz,(z) > n 5 ^ 3 . In 
the second case, we simply take ct • Bi(( fc ” j .)?z' 5 / 3 ,p) as a dominating variable and 
a similar calculation holds. This proves the upper bound in both cases. 

For the lower bound, we observe that since 71 -< BFS, the number of queries 
that would result in exactly ce branchings that we make from each j- set is at 
least (1 — £ *)\k—j) m Thus X dominates a random variable Z with distribution 
cn ■ Bi((l — £*)( fe W)di,(z),p). A similar calculation shows us that 

Pr(D < (1 - 77) (1 + e)nd L (i)) < Pr (Z < (1 - 77) (1 + e)re.d L {i)) 

(Thm[5j 

< exp (-(77 - e*) 2 (l + e)(l - e*)r e d L {i)/2) 

< exp(-Q(r] 2 d L (i))) 

< exp(—0(?r <5,/4 )). □ 


We now complete the proof of Lemma [TG] We have already observed that 
io{£+l) > i\{t) and that the statement of Lemma ITTT1 holds for l = 0, so by 
the inductive hypothesis we assume that the conclusion of Lemma |T6] holds for 
l' = 0, ...,£— 1 and for all generations i £ [io(^), *i]- By Corollary [IT] we know 
that for this range of i, d(i) is large enough to apply all of our concentration results. 

Pick any i\t) £ [zo(^),*i]. For s £ we let d s := + s) 

and let d' s := max(d s , n s ^ 3 ). We assume that all of the high probability events of 
the previous claims hold (this occurs with probability at least 1 — exp(—©(n 5 / 4 ))). 
Then we have 


d s < (1 + 77)(1 + e)r"ed' s _i + (1 + 2t£_i)(1 + e) 


(j-e) ((e) ~ ©) \d@(e) + s-l)\ 


co 


(") 


= (1 + 7)0- + e)nd' a _ 1 + (1 + 27^-1 )g |+ s - 1 )| 


9 = g{k, j, £, n, Sj'ye-i) := (1 + e) 


(?:!) (( 1 ) - (<)) 

co 



where 


-1 
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Note that g is not dependent on s, and that (1 + g)(l + e)re < 1. If d' s _ 1 = d s - i, 
we apply the same inequality with an index shift, and keep iterating until we have 
some d s ' < n s / 3 or else we reach s' = 0. In this way we obtain 

d s < (1 + v) s ( 1 + £) s r/do + n 5/3 

S 

+ (1 + 2y^_i )g Y (1 + vY' (1 + eY'r/ \d(Y(i) + s - s')| 

s'=0 

< (1 + vY + £ Y r Yd 0 + (1 + 2y^_i )g Y (f + e) s r e s \d[Y(i) + s - s')|^ + n 5/3 . 


To calculate the corresponding lower bound we cannot use (1 — rj)( 1 + £)ryd s _i 
from Lemma [23l since it may be that d s -1 < n* 5 / 3 , in which case that lemma does 
not give us any lower bound. Instead we use the lower bound (in all cases) of 
(1 — 77)(1 + e)red s -i — n* 5 / 3 , which may be negative but which will turn out to be 
good enough. Thus we have 


d s > (1 - r?)( 1 + e)red s -i - n s/3 


+ (1 - 2y^_i) (1 +e) 


(jJ) ((5) - (3) |9(.t(f) +«-!)] 


co a 

> (1 - 277)(1 + e)ryd s _i + (1 - 3^-Yg \d(Y(£) + s - 1)|. 



Note that we have absorbed the —n * 5 / 3 term by modifying the 7^-1 term. This is 
permissible since |<9(i)| > n e+s , and so 7^_i|i9(i)| /r/ n* 5 / 3 . Iterating gives 

d s > (1 - 277) s (1 + eYrYdo 

S 

+ (1 - 3^-Yg ^(1 - 2 t 7) s '(1 + eY'r/\d(i\t) + s - s')| 

s'=0 

> (1 - 2 riY ^(1 + £Yn s do + (1 - 3 "fe-x)g Y ( x + £ Y' r / \d{Y{£) + s - s')|^ ■ 

Observe that only the first term of both the upper and lower bound on d s depend 
on L. Additionally, for s > »*(f),we have 

0 < (1 - 2 ? 7) s (1 + eYrYd 0 < (1 + gYi 1 + £ Y r Ydo 

< ((1 + 277 + 2e) r/,) 1 ^ iY~ e < 1 


since we chose i*{£) 


{j-t) logw 
- log((l+2r;+2e)r^) 


In other words, do, the degree of L at 


time i'(Y) only has an influence of at most one by time Y(^) + s > which will not 
affect calculations significantly, so we ignore it. 

The remaining upper and lower bounds do not depend on L. Furthermore, ob¬ 
serving that g = &(n~ e ) and that |9(i)| > Q(n e+S ), we haven * 5 / 3 = 0(n~ 2S ^ 3 g\d(i)\), 
and so the remaining upper and lower bounds differ by a multiplicative factor of 


(1 + Q(n~ 2<5 / 3 ))(1 + r/YY + 271 -!) 

(1-2t,)»( 1- 3 7 /-i) 


< (1 + 5?7) s (l + 67 ^ 1 ) 


< (1 + 
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By definition we have 

1 + 77^_i <1+7^- 

Taking a union bound over all sets L of size i and all possible starting generations 
i\t), we may say that with probability at least 1 — exp(—©(n 5 / 4 )), all £-sets have 
asymptotically the same degree in d(i). More precisely we have 

5 dL(<)= 0 ia(i)i 

which implies that 

ye) L \j) 

We further have, for any particular f-set Lq , that by the arguments above 

< d Lo (i) < ()- + le)-irJ2 dL ^ 

\£j j_j \a,) 

and since 1/(1 + je) > 1 — 7^, the conclusion follows. The proof of Lemma 17771 is 
now complete except for the proof of Lemma 1151 □ 

Note that the introduction of i^(£), which can be any generation after the fixed 
one io(£), ensures that our concentration does not get worse further along the 
process due to compounding error terms. Instead, we can imagine starting the 
smoothing process at any time, and choose the one that gives us the best possible 
concentration. Of course, in reality this means that once things are smooth, they 
remain smooth. 

7 . Smoothness at stopping time: Proof of Lemma fl 5 l 

Recall that £ is the event that stopping conditions (IS2l) or (IS 3 l) were implemented, 
i.e. Cj x became large or its boundary became large before the component was fully 
explored. 

In order to complete the proof of Lemma ED we need to know that, conditioned 
on £, with high probability i\ > i\(j — 1), i.e. the smoothing process is finished 
before the stopping conditions are reached. If this were not true, then Lemma [ 16 ] 
would be an empty statement and therefore of no help. 

In order to prove that it is not, we will need the following two technical lemmas. 
The first roughly states that the initial expansion happens fast enough that the 
portion of the component discovered before the start of the smoothing process (for 
(j — l)-sets) is negligible. 

Lemma 24 . With probability at least 1 — exp(—©( ji* 5 / 2 )), for every £ = 0 ,..., j — 1 , 
we have 

(Cj, (i 0 (j - 1 ))) = o(A?r- 7_£ ). 

The second lemma tells us that there will be enough time between the start of 
the smoothing process and the stopping conditions being reached for the smoothing 
process to complete. 

Lemma 25 . Conditioned on £, with probability at least 1 — exp(— ©(n* 5 / 2 )), we 
have i\ — *o(j — 1) > §£ -1 logn 
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We will prove these two lemmas in Sections 17.21 and 17.11 respectively. We first 
show how together they imply Lemma 1151 

We note that by definition i\(j — 1) — i 0 (j — 1) = i*(j — 1) = O(logn), where 
the constant depends only on /c, j, and therefore this is substantially smaller than 
e -1 logn. Thus conditioned on E, with probability at least 1 — exp(— Q(n s ^ 2 )) we 
have i\ > i\{j — 1) and therefore Lemma IT5l follows directly from Lemma flfil 
Thus the proof of Lemma [El and therefore of Theorem 0 is complete except for 
the proof of Lemmas [M] and [25] □ 


7.1. Proof of Lemma l25l In this section we will use Lemma[24]to prove lemmal25l 
The expansion from one generation to the next is at most (l + 2£*)(l+e) < l + 2e 
by Corollary [TTj Let x := i\ — io(j — 1)- Suppose first that we hit the stopping 
condition (1531) . i.e. \dCj x \ > A 2 n J . Since by Remark [T3l |9(*o(j — 1))] < 2n- 7_1+ ' 5 , 
we then have (1 + 2e) x > A 2 n 1_l5 /2, which gives 


r > log(A 2 n 1 ~V2) > log(n*) 
— log(l + 2e) — 2e 


-e 1 logn. 
2 6 


On the other hand suppose we hit the stopping condition |CjJ > A n?. Then 
Lemma [Ml (for i = 0) implies that up to time io(j — 1) we have only seen o(An J ) 
j-sets in total, and thus by Remark [H] we have 


5^(1 + 2e) i 2nP~ 1+s > (1 - o(l))An j 

i=0 


1 - (1 + 2£) x+1 
1 - (1 + 2e) 

(1 + 2e) x 


xlog(l + 2e) 


> (1 — o(l))An 1_,5 /2 

> (1 — o(1))£A?i 1 ^ 5 > n 5 

> <5 log n 


S 

x > r ~ 


logn 


as required. Note that in all the Lemmas, Corollaries and Remarks that we used 
here, the “good” event holds with probability 1—exp(—0(?r <5,/2 )), and a union bound 
over all failure probabilities still leaves a probability of at least 1 — exp(—©(n 15 / 2 )). 

□ 


7.2. Proof of Lemma l241 The critical tool in our proof of Lemmal24lis Lemmal26l 
which gives a lower bound on the probability that the process will become large 
within a certain number of steps. Recall that T* is a branching process starting 
with one individual in the 0-th generation and whose offspring distribution Z is the 
distribution of the random variable 

Co • Bi ((1 — £.)(k c) ,p) 

We have also seen that this provides a lower coupling on our actual search process 
during the time period that we are interested in. 

Let c > 0 be some constant and let r := (j — 1 + S + c)e _1 log?r. For any i £ N 
we denote by |9*(*)| the size of the i-th generation of %■ 

Lemma 26. With probability at least e/n c , |9*(t)| > n J_1+l5 . 
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We aim to imitate a proof of Markov’s inequality. However, since we have no 
upper limit on the size of the boundary, we will need to limit the contribution that 
the upper tail makes to the expected size of the boundary. 

In order to do this we need to bound the upper tail probabilities, preferably 
exponentially. For this we imitate a proof of a Chernoff bound, applying Markov’s 
inequality to the random variable for some well chosen f. 

We therefore need to calculate E (e^ 9 *WI) . Let (p(z ) := 0 e zt Pr(Z = t ) be 
the moment generating function of Z. We also define 

^ = = (1 + e) r n c / 2 

and Ci recursively by 
C 0 := 1 + <5 

Ci := 1 + (1 + J)(i + 1) + for i> 1. 

Claim 27. For all i we have 

E (e ? ' a *(*)') = ip (log (e (e € ' 9 *(* _1 )'))) . 

In particular, 

l^l a * (i) l) < 1 + Ci (1+ £)*$, (22) 

for sufficiently large n as long as 

i (1 + e) 4 £logn = o(l). (23) 

Proof. The first part of the claim is a standard fact about moment generating 
functions, which can be proved by conditioning on the number of children in the 
first step: 

OO 

E(ef a *Wl) =^Pr(|d*(l)| =t)E(e«l a *«l | |3*(1)| = i) 

t =0 

oo t 

= ^Pr(|^(l)|=t)E(ef a *«l||a*(l)| = l) 

t= 0 

CO £ 

= £Pr(|&(l)| =t ) E (ef 
t= 0 

= <^(log (e . 

In proving the second part of the claim, for simplicity we will actually calculate the 
expectation in I~* rather than 71, which amounts to ignoring the (1 — e*) terms. 
This is permissible since we only require an upper bound on E(e^ a *WI) (and is a 
good approximation in any case since £* <C e). 

We prove the second part of the claim by induction on i. For the case i = 0 the 
statement becomes < 1 + Cof which is certainly true for sufficiently large n since 
£ = o(l) and Co > 1 is bounded away from 1. We therefore assume that the result 
holds for i — 1. 

One fact which we will use repeatedly is that for i < r we have 

Ci = O(logn). 


(24) 



THE SIZE OF THE GIANT COMPONENT IN RANDOM HYPERGRAPHS 


31 


This can easily seen by induction on i, since 

Ci = 1 + 0{r(Ci- 1 (1 + e) T t; + £)) 
= 1 + 0(r(C i _ie/n c + e)) 

= 1 + O(te) 

= O(logn). 


To simplify notation we define Xi := (1 + e)*£. Also, let 

Pi ■= (1 + Ci-xXi-x) 00 p 


and note that for i < t we have p < pi < p T = (1 + o(l))p (and in particular 
Pi < !)• We now have (using t = cqS above and the induction hypothesis) 


E(e«l a *«l) =^^^)^ 5 (1 -p ) (A)- s E( e «l 9 *( i - 1 )l) C ° S 

< - p)( k ~^~ s (1 + Ci-iXi-i) CoS 

S ' ' 



The terms in the sum are simply binomial probability expressions, now with a 
slightly different probability, and therefore their sum is 1 . Thus we obtain 

= (i + (pi - p){i + Pi + p^ +.. 

si+ G->(H (i+ " + --> 

+ ^(l-0 >( I + " + - ) ,+o te- 1 )’)- 

Now 

- - 1 = CoCi^Xi-! + Wdi-lVi 2 + OiCi-^Xi-! 3 ) 

P V 2 / 

so we have (using the fact that Xi-\ < Xi) 

E < 1 + (1 + e) ^ 'Ci-\Xi H — Ci-\ 2 x 2 + 0(Ci-\ 3 x , 3 )^ (1 +Pi + ■ • ■) 

+ (1 + e) 2 ^— a '* 2 0(Ci~\ 3 x 3 )^ (1 + Pi + • • -) 2 + 0(Ci-i 3 x 3 ) 

< 1 + Ci-iXi ^1 + e + pi + — Ci—iXi + 0(Ci— \ 2 x 2 + p 2 + £ 2 )^ . 
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Finally, let us observe that, provided i( 1 + e) l £ = o(l), we have 

Ci -1 ^1 + s + Pi + — Ci-iXi + 0(Ci-\ 2 Xi 2 + p 2 + £ 2 )^ 

= ^1 + (1 + 5)i ^1 + e + — Ci-iXi + o(e + Ci-icc,)^ . 

Using the facts that Ci -2 < C,_i and :Ej_i < Xi, this gives us 
Ci -1 ^1 + e + Pi + — Ci-iXi + 0{Ci-\ 2 x 2 + p 2 + e 2 )^J 

< 1 + Ci-iXi ^(1 + <5)i— + — + o(l)^ + £ ((1 + 8)i + 1 + o(l)) 

< 1 + Cj_ia;i(l + <5) (z + 1) ——F e( 1 + <5)(i + 1) 

= Ci 

as required. □ 

Proof of Lemma [2El Now for any number a, let 

q a := Pr(|<9*(r)| > a) 

We set z := n- 7_1+5 and ?/j := (1 + e) T n lc /(2e) for every i = 1,2,... and note that 

Vl = n (j-l+<5+c)£- 1 log(l+U n c^ £ 

> rP~ 1+&+3c ^ 2 / e 

> z. 

Our main aim is to find a lower bound on q z . We observe that 

OO 

E(|d*(r)|) < (1 - q z )z + q z yx +^2q yi yi + i. (25) 

2=1 

Recall that 

C t t( 1 +e) T £ = o(l) 

and thus Markov’s inequality tells us that 

^ E (e«l a *( T )l) Sl+C r r(l + e) T £ ^ _„W 3 

^ Vi — e (,Vi — gTlO 3/2 )c j 2 — f: 

for n sufficiently large. Hence 

00 /1 . \ r 00 

E ^ E e-”‘ C/3 n (i+1)c 

2=1 2=1 

< e~ 1 n : ’~ 1+s+c e~ nC/i 
= o(e- c/5 ). 

We have thus shown that in (12511 the contribution to E(|<9*(r)|) from the upper tail 
is negligible. Now (1251) tells us that 

q z y 1 - g 2 z > E(|0*(a;)|) - z + o(e _n ° /5 ) 
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and therefore 

^ (1 + e) T - n j ~ 1+s + o(e~ nC/5 ) 

Qz ~ (1 + s) T n c /( 2e) - ni~ 1+s 

> £(l + e) T (l + o(l)) 

“ 2n c (l + e) r (l + o(l) 

> e/n c , 

where we used the fact that n^~ 1+s = o((l + e) T ). This completes the proof of 
Lemma [211 □ 

We now show how to deduce Lemma [2H from Lemma 1261 We first consider 
the case £ = 0. In this case we aim to prove that with very high probability 
e{C Jl ( i 0 (j - 1))) = o(An J ). 

We first observe that if the statement does not hold, then we certainly have 
e (Cjj (*o(j ^ 1))) > Xn^/oj, for any w —> oo, but each generation up to time 
io(j — 1) only has size at most n- 7_1+<5 . 

We now consider two cases for possibilities of how the desired conclusion might 
fail, and show that each of these is very unlikely. 

Case 1: There is a generation i of size at least n 1 / 2_<5 / 2+c . 

In this case, using % as a lower coupling for the search process, we begin y = 
n 1 / 2 - <5 / 2+c independent new processes at time i. By Lemma [^H each of these has 
a probability of at least e/n c of reaching size at least n- 7_1+<5 within r steps. The 
probability that io(j — 1) > « + r is therefore at most 

(1 — e/n c ) v < exp(— ey/n c ) 

< exp(-n" 1/3 n 1 / 2 - ,5/2+ 7n c ) 

= exp(—n 5 / 2 ). 

We may therefore assume that Case 1 does not occur for i < io(j — 1) — r. 

Case 2: io(j — 1) > (r + 1 )n 1//2-l5 / 2+c . 

To show that this is unlikely we once again aim to consider y independent pro¬ 
cesses, but in order to do this we take one individual from each of the generations 
i(r + 1), for i = 0,1,2,..., n 1 ^ 2 ~ s ^ 2+c — 1 and consider the lower coupling T* for 
the search process. Technically these are not independent processes, since one may 
be a subprocess of the other, but for this reason we consider the process 

which is cut off after r steps. These processes are now independent, and the same 
calculation as above shows that the probability that none of them become large 
enough after r steps is very small. We may therefore assume that Case 2 does not 
occur. 

However, if neither case occurs, then we have (assuming j > 2) 

e(G') <(r + i) n V 2 -V 2 +c n i/ 2 -V 2 +c + Tn j-i+s 

< 2r(n 1_<5+2c + n j ~ 1+s ) 

< 7 " 1+2s /e 
= o(\n?) 

as required. 

This shows that Lemma [24] holds for l = 0. However, now we know that up to 
time io(j — 1) we have found few edges, which intuitively should mean that we are 
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early in the process, and that all maximum degrees will be small. More precisely, 
e(Cj 1 (io(j — 1))) < Arp/w and therefore by Property the number of queries 
we have made in the search process is at most 

Therefore applying Property (0 with a = X/y/uj, we have that 

A* (Cj, (ioU ~ 1))) = 0{Xn?- l /y/w) = o(Xn- j ~ e ) 

for ever t = 1,..., j — 1 as required. This completes the proof of Lemma, 1211 □ 

8 . Concluding Remarks 

8.1. Hypertrees. In this paper, we used the fact that Lemma [G] can be applied to 
the search process BFS to show that % -< BFS -< T*. However, this also applies to 
the following variant of the search process: We define BFS2 to be the corresponding 
breadth-first search process in which a £:-set may only be queried if it contains Cq 
neutral j-sets (as opposed to at least one for BFS). 

BFS2 is a search algorithm specifically looking for a tree, where we define a tree 
to be a component with e edges and Coe + 1 j-sets. Note that this algorithm will 
not necessarily reveal all of a component; however all the arguments involving BFS 
which we use in this paper also hold for BFS2. Thus we deduce that the number 
of j-sets contained in large trees is approximately A ("), and indeed these (rooted) 
trees are smooth in the sense of the smooth boundary lemma. 


Since random graphs have been so extensively studied, various further questions 
immediately suggest themselves, regarding whether we can also prove similar results 
for hypergraphs. 

8.2. The critical window. Theorem [2] contains two conditions which give lower 
bounds on e, namely e 3 n^ —> oo and e 2 n 1 ~ 2S —> oo. The natural conjecture is that 
only the first of these should be genuinely necessary, and thus the result should 
hold provided e n~^ 3 . Note that for e = 0(n~^ 3 ), the bounds from the super¬ 
critical case (0(en J )) and the sub-critical case ( 0(e~ 2 log n)) match up to the logn 
term, suggesting that we have a smooth transition. 

However, our proof method requires the additional second condition, which takes 
over when j > 2. Thus for these cases, our range of e is probably not best possible. 

This additional condition arises in the proof because we need the smooth bound¬ 
ary lemma. The boundary has size up to A 2 n J = o(e 2 n : '), and the typical degree 
within the boundary of an £-set is o(e 2 n J_f ). If we are to show that these degrees 
are concentrated, we need this typical degree to be large, which in the case of 
t = j — 1 means e 2 n must be large, thus leading to our condition on e. 

If we were to attempt to do away with this condition, we would need to have 
some control over degrees which may be very small. Presumably we would need to 
determine more precisely what the probability distribution of such degrees is. 

8.3. The distribution of the size of the giant component. The size of the 
giant component shortly after the phase transition is a random variable whose mean 
we have (asymptotically) determined in this paper, and we have further shown that 
it is concentrated around its mean. However, we have not proved what the actual 
distribution of this random variable is. 
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The most likely candidate would be a normal distribution as is the case for 
graphs (as proved by Pittel and Wormald m and by Luczak and Luczak 12l). 
More generally for the 1-component in fc-uniform hypergraphs, analogous results 
were proved for various ranges of e by Karonski and Luczak [9] and by Behrisch, 
Coja-Oghlan and Kang [T|, and for the whole of the supercritical regime (s n~ 1//3 ) 
by Bollobas and Riordan [3]. For these cases, central and local limit theorems are 
known. It would be interesting to prove similar results for the size of the largest 
j-component. 

8.4. The structure of the components. For graphs, it is well known that shortly 
after phase transition, all components are whp trees or at most unicyclic. It would 
be interesting to know if something similar holds in hypergraphs. (A natural gen¬ 
eralisation of unicyclic would be the most tree-likc connected non-tree structure, 
i.e. a j-component with e edges and coe j- sets. In this case, when discovering the 
component, exactly one j -set would be seen twice.) It follows very easily from the 
results of this paper and our remarks about BFS2 that for almost every j-set, the 
component containing it is a tree. Once again, the case j = 1 has previously been 
studied (see [9], j]). 

8.5. f'-Cores. The study of cores in random graphs was initiated by Bollobas. The 
£-core is the (unique) largest subgraph of minimum degree at least i. 

For hypergraphs, the degree has many possible generalisations just as connec¬ 
tivity does. For each of these, we may then define the i-core and consider its 
properties. For vertex-degree, Molloy El determined the critical threshold for the 
existence of a non-empty £-core and its asymptotic size, but in general this is still 
an open question. 

These questions and many more suggest themselves as the first steps in the 
developing field of random hypergraph theory. 


Appendix A. Proof of Lemma [To] 

Recall that we proved only the upper bound for the first half of Lemma 1101 For 
the lower bound, we note that since we are in the range where T* will give us a 
lower bound, \dCj(i + 1)| stochastically dominates a random variable Zi + \ with 
the distribution of Co • Bi(xj(l — e*)( fc T),p). Thus we have, again by the Chernoff 
bound (Theorem [5]) , 

Pr (| dCj(i + 1)| < (1 - 2e*)(l + e)xi ) < Pr (Z i+1 < (1 - 2e*)(l + e)xi) 

< Pr (Aii<(l — e .) E (S±i)) 

= exp (-0 (e* 2 Xj)) 

< exp(-©(n^ 2 )) 


provided that Xi > n 1 s . 
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For the second half, the calculation is very similar. \dCj(i + 1)| is dominated by 
a random variable Yi +1 with distribution co • Bi(n^( fc ”-),p), and so 


Pr (| dCj(i + 1)| > 2 n 5 ) < Pr (Y i+1 > 2 n 5 ) 

- p V 3c 0 ) 

= exp (—0 (n s )) 

< exp(—©(n 5 / 2 )). 


□ 


Appendix B. Branching Processes 


Recall that we aim to calculate the asymptotic value of the survival probability 
g of the branching process T*, in which the number of children has distribution 
co • Bi(( fc ” j .),p), where p = (1 + e)po- For technical reasons, it is slightly easier 
first to calculate the probability go that at least one of Co independent branching 
processes (each an instance of T*) survives. In this case the number of children in 
the first generation has distribution Bi(co( fc (A),p). 

By standard results for branching processes (see [7]) we know that g > 0. We 
note that the processes all die out if and only if all subprocesses starting at children 
in the first generation die out. Making a case distinction on the number of such 
children, we obtain 



= (p( 1 - £>o) + 1 -p) Co ('"-j) 
= (1 -P0o) co ^ k ~^ 

= (1 - (1 + e)poQo) P ° ■ 


Solving this equation for e yields 

i-(i-g 0 r 1 ../(g 0 ) ) 
gopo gopo 

for 


(26) 


f(e o) = l - gopo - (l - go) Po 

_p 0 (l-po) 2 , Po(l -Po)(2-Po) 3 , 

- 2 90 + 6 g ° + ■ ■ ■ ■ 

Since this expression has only non-negative coefficients, (1261) implies that 


£ > 


Po(l-Po) „2 

2 ft) 


go 

2 


QoPo 


O{pogo) 
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and hence go = o(l) since e = o(l) and po = o(l). Consequently we derive the 
asymptotic estimate 

e = -y + O (goPo + el) ~ -y (27) 

from (EB1) . Since 1 — go = (1 — g) c ° , we have 

6= 1- (l-£>o)^° = — +0(gl) ® — . (28) 

Co Co 

Remark 28. Note that an almost identical calculation shows that the survival prob¬ 
ability p* of the lower coupling process % also satisfies 

<29) 

This is because the expected number of children of each individual is (1—£*)(l+£) = 
1 + £ + o(e). 


Recall that 7i> is the dual process, i.e. the process T* conditioned on V , the 
event that T* dies. We consider the children of an individual as being grouped into 
litters , each containing cq children, where the number of litters has distribution 
Bi ((fc-i)>P) in T* . We want to show that Tv is a branching process in which each 
individual has a number of litters of children which has binomial distribution. 

To analyse the dual process we consider the change of the probability of T> subject 
to the presence A e of a litter of children of an individual J. We denote by d( J) the 
set of individuals in the same generation (of T*) as J and calculate the probabilities 
Pr (T> | A e ) and Pr (V \ ->A e ) by conditioning on the number of litters s(d(J)) of 
children of individuals in d( J) and obtain 


M -1 

Pr (v \A e )=J2 Pl '( s (00 J)) = i +1 Me) (1 - e) {i+1)co 

i =0 
M—1 

= Pr (Bi(M - 1 ,p)= i)(l - p)( i+1 ) c f , 
2=0 
M—l 

Pr (V | -,A e ) = J2 Pr ( s (d(J)) = i | -Vl e ) (1 - g) ic ° 

2=0 

M—l 

= Pr (Bi(M — l,p) = *)(1 — g) ic ° 

i=0 


and so 

Therefore we get 

Pr (A e | 27) = 


Pr(Z>| A e ) 


= (l-g) C0 . 


Pl(V | -.Ae) 

Pr(p | yl e )Pr (Ae) 

Pr {V I Ae) Pr (Ae) + Pr (V \ -.Ae) Pr (~.A e ) 

Pl(V I Ae) p /* \ 

Pr(X> I =Ae) e > 


j§g^^.Pr(A) + PrK4e) 

J5ol (1 — q) C ° P 


i-p(i-(i- e n 


( 30 ) 
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and note in particular that this probability is independent of the choice of e and J, 
hence we denote it by px> ■ Moreover we obtain the estimate 


(1 - g) c °p 



= p (i - e) co + 0{p 2 g) 



Furthermore a very similar calculation shows that conditioned on T> the presence of 
e is still independent of all other edges. Hence the dual process Tb is a branching 
process whose offspring distribution is given by 



c 0 • Bi 
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